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A real-time list processing system is one in 
which the time required by the elementary list 
operations (e.g. CONS, CAR, COR, RPLACA, 
RPLACD, EQ, and ATOM in LISP) is bounded by a 
(small) constant. Classical implementations of 
list processing systems lack this property 
because allocating a list cell from the heap may 
cause a garbage collection, which process 
requires time proportional to the heap size to 
finish. A real-time list processing system is 
presented which continuously reclaims garbage, 
including directed cycles, while linearizing and 
compacting the accessible cells into contiguous 
locations to avoid -fragmenting the free storage 
pool. The program is small and requires no 
time-sharing interrupts, making it suitable for 
microcode. Finally, the system requires the 
same average time, and not more than twice the 
space, of a classical implementation, and those 
space requirements can be reduced to 
approximately classical proportions by compact 
list representation. Arrays of different sizes, a 
program stack, and hash linking are simple 
extensions to our system, and reference counting 
is found to be inferior for many applications. 
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1. Introduction and Previous Work 

List processing systems such as LISP [25] have slowly 
gained popularity over the years in spite of some rather 
severe handicaps. First, they usually interpreted their 
programs instead of compiling them, thus increasing their 
running time by several orders of magnitude. Second, the 
storage structures used in such systems were inefficient in the 
use of storage; for example, compiling a program sometimes 
halved the amount of storage it occupied. Third, processing 
had to be halted periodically to reclaim storage by a long 
process known as garbage collection, which laboriously 
traced and marked every accessible cell so that those 
inaccessible cells could be recycled. 

That such inefficiencies were tolerated for so long is a 
tribute to the elegance and productivity gained by 
programming in these languages. These languages freed the 
programmer from a primary concern: storage management. 
The programmer had only to call CONS (or its equivalent) to 
obtain a pointer to a fresh storage block; even better, the 
programmer had only to relinquish all copies of the pointer 
and the storage block would automatically be reclaimed by 
the tireless garbage collector. The programmer no longer had 
to worry about prematurely freeing a block of storage which 
was still in use by another part of the system. 

The first problem was solved with the advent of good 
compilers [27,32] and new languages such as SIMULA 
especially designed for efficient compilation [1,5,14]. The 
second was also solved to some extent by those same 
compilers because the user programs could be removed from 
the list storage area and freed from its inefficient constraints 
on representation.' Other techniques such as compact list 
representation ("CDR-coding") [12,19] have been proposed 
which also offer partial solutions to this problem. 

This paper presents a solution to the third problem of 
classical list processing techniques and removes that 
roadblock to their more general use. Using the method given 
here, a computer could have list processing primitives built 
in as machine instructions and the programmer would still be 
assured that each instruction would finish in a reasonable 
amount of time. For example, the interrupt handler for a 
keyboard could store its characters on the same kinds of 
lists — and in the same storage area — as the lists of the main 
program. Since there would be no long wait for a garbage 
collection, response time could be guaranteed to be small. 
Even an operating system could use these primitives to 
manipulate its burgeoning databases. Business database 
designers no longer need shy away from pointer-based 
systems for fear that their systems will be impacted by a 
week-long garbage collection! As memory is becoming 
cheaper, 2 even microcomputers could be built having these 
primitives, so that the prospect of controlling one's kitchen 
stove with LISP is not so far-fetched. 

A real-time list processing system has the property that 
the time required by each of the elementary operations is 
bounded by a constant independent of the number of cells in 
use. This property does not guarantee that the constant will 
be small enough for a particular application on a particular 
computer, and hence has been called "pseudo-real-time" by 
some. However, since we are presenting the system 



Mn many cases, a rarely used program is compiled not to 
save time in its execution, but to save garbage-collected 
storage space. 

2 Work is progressing on 10 6 bit chips. 
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independent of a particular computer and application, it is the 
most that can be said. In all but the most demanding 
applications, the proper choice of hardware can reduce the 
constants to acceptable values. 

Except where explicitly stated, we will assume the 
classical Von Neumann serial computer architecture with real 
memory in this paper. This model consists of a memory, i.e. 
a one-dimensional array of words, each of which is large 
enough to hold (at least) the representation of a nonnegative 
integer which is an index into that array; and a central 
processing unit, or CPU, which has a small fixed number of 
registers the size of a word. The CPU can perform most 
operations on a word in a fixed, bounded amount of time. The 
only operations we require are load, store, add, subtract, test 
if zero, and perhaps some bit-testing. It is hard to find a 
computer today without these operations. 

As simple as these requirements are, they exclude virtual 
memory computers. These machines are interesting because 
they take advantage of the locality of reference effect, i.e. the 
nonzero serial correlation of accesses to memory, to reduce 
the amount of fast memory in a system without greatly 
increasing the average access time. However, the time 
required to load a particular word from virtual memory into a 
CPU register is not bounded because the primary memory may 
have to fetch it from a lower level memory. Since we are 
more interested in tight upper bounds, rather than average 
performance, this class of machines is excluded. 

Since the primary list processing language in use today 
is LISP, and since most of the literature uses the LISP 
paradigm when discussing these problems, we will continue 
this tradition and center our discussion around it. Due to its 
small cells, which consist of 2 pointers apiece, LISP is also a 
kind of worst case for garbage collection overhead. 

There are two fundamental kinds of data in LISP: list cells 
and atoms. List cells are ordered pairs consisting of a car and 
a cdr, while atoms are indecomposable. ATOM(jc) is a 
predicate which is true if and only if x is an atom (i.e. if and 
only if x is not a list cell); EQ(xj;) is a predicate which is true 
if and only if* and y are the same object; CAR(x) and CDR(jc) 
return the car and cdr components of the list cell jc, 
respectively; CONS(jc^) returns a new list cell (not EQ to any 
other accessible list cell) whose car is initially x and whose 
cdr is initially y; RPLACA(x,y) and RPLACD(x,y) store y 
into the car and cdr of x, respectively. We assume here that 
these seven primitives are the only ones which can access or 
change the representation of a list cell. 

There have been several attempts to tackle the problem 
of real time list processing. Knuth [22,p.422] credits 
Minsky as the first to consider the problem, and sketches a 
multiprogramming solution in which the garbage collector 
shares time with the main list processing program. Steele's 
paper [30] was the first in a flurry of papers about 
multiprocessing garbage collection which included 
contributions by [16,17] and [23,24]. [28] independently 
detailed the Minsky-Knuth-Steele method, and both [28] and 
[33] analyzed the time and storage required to make it work. 

The Minsky-Knuth-Steele-Muller-Wadler (MKSMW) 
method for real-time garbage collection has two processes 
running in parallel. The list processor process is called the 
mutator while the garbage collector is called the collector 
(these terms are due to [16]). The mutator executes the user's 
program while the collector performs garbage collection over 
and over again. The collector has three phases: mark, sweep, 
and relocate. During the mark phase, all accessible storage is 
marked as such, and any inaccessible storage is picked up 
during the sweep phase. The relocate phase relocates 



accessible cells in such a way as to minimize the address 
space required. Since the mutator continues running while the 
mark and relocate phases proceed, the free list must be long 
enough to keep the mutator from starvation. During the 
sweep phase, cells must be added to the free list faster than 
they can be taken off, on the average, lest the net gain in 
cells from that garbage collection cycle be negative. 

The behavior of this algorithm under equilibrium 
conditions (which is when a cell is let go for every cell 
CONSED, and when the rates of cell use by the mutator, and of 
marking, sweeping, and relocating by the collector, are all 
constant) was studied in [28] and [33]. If we let m be the ratio 
of the rate of CONSING to that of marking, s be the ratio of 
the rate of CONSING to that of sweeping, and r be the ratio of 
the rate of CONSING to that of relocating, then we can derive 
estimates of the size of storage needed to support an 
accessible population of N cells under equilibrium 
conditions. 3 Using these assumptions, we derive: 

Maximum MKSMW Storage Required 

„ „ m+(m+\)(r+\) . , „ 

— l-j(r+l) — + S1Ze c °N ector stac k 

We note that r=0 if there is no relocation (i.e. it happens 
instantaneously), and we have the simpler expression: 

Maximum MKSMW Storage Required 

< N^~^- + size of collector stack 
l -s 

The collector stack seems to require depth N to handle the 
worst case lists that can arise, but each position on the stack 
need only hold one pointer. Since a LISP cell is two 
pointers, the collector stack space requirement is 0.5//. 
Thus, we arrive at the inequality: 

Maximum MKSMW Storage Required < N L5 ^'' 5s 

These estimates become bounds for nonequilibrium 
situations so long as the ratios of the rate of CONSING to the 
rates of marking, sweeping, and relocating are constant; i.e. 
we relativize the rates of marking, sweeping, and relocating 
with respect to a CONS counter rather than a clock. 

The Dijkstra-Lamport (DL) method [16,17,23,24] also 
has the mutator and collector running in parallel, but the 
collector uses no stack. It marks by scanning all of storage 
for a mark bit it can propagate to the marked cell's offspring. 
This simple method of garbage collection was considered 
because their concern was proving that the collector actually 
collected only and all garbage. Due to its inefficiency, we 
will not consider the storage requirements of this method. 

Both the MKSMW and the DL methods have the drawback 
that they are parallel algorithms and as a result are incredibly 
hard to analyze and prove correct. By contrast, the method 
we present is serial, making analyses and proofs easy. 

2. The Method 

Our method is based on the Minsky garbage collection 
algorithm [26], used by Fenichel and Yochelson in an early 
Multics LISP [18], elegantly refined in [11], and applied by 
Arnborg to SIMULA [1]. This method divides the list space 
into two semispaces. During the execution of the user 
program, all list cells are located in one of the semispaces. 
When garbage collection is invoked, all accessible cells are 
traced, and instead of simply being marked, they are moved to 



3 Of course s<\, or else the storage required is infinite. 
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the other semispace. A forwarding address is left at the old 
location, and whenever an edge is traced which points to a 
cell containing a forwarding address, the edge is updated to 
reflect the move. The end of tracing occurs when all 
accessible cells have been moved into the "to" semispace 
(tospace) and all edges have been updated. Since the tospace 
now contains all accessible cells and the "from" semispace 
(fromspace) contains only garbage, the collection is done 
and the computation can proceed with CONS now allocating 
cells in the former fromspace. 

This method is simple and elegant because 1) it requires 
only one pass instead of three to both collect and compact, 
and 2) it requires no collector stack. The stack is avoided 
through the use of two pointers, B and S. B points to the first 
free word (the bottom) of the free area, which is always in the 
tospace. B is incremented by COPY, which transfers old cells 
from the fromspace to the bottom of the free area, and by 
CONS, which allocates new cells. S scans the cells in 
tospace which have been moved, and updates them by 
moving the cells they point to. S is initialized to point to 
the beginning of tospace at every flip of the semispaces and 
is incremented when the cell it points to has been updated. At 
all times, then, the cells between S and B have been moved, 
but their cars and cdrs have not been updated. Thus, when 
S-B all accessible cells have been moved into tospace and 
their outgoing pointers have been updated. This method of 
pointer updating is equivalent to using a queue instead of a 
stack for marking, and therefore traces a spanning tree of the 
accessible cells in breadth-first order. 

Besides solving the compaction problem for classical 
LISP, the Minsky-Fenichel-Yochelson-Cheney-Arnborg 
(MFYCA) method allows simple extensions to handle 
nonuniformly sized arrays and CDR-coding because free 
storage is kept in one large block. Allocation is therefore 
trivial; one simply adds n to the "free space pointer" to 
allocate a block of size n. 

Copying garbage collectors have been dismissed by 
many as requiring too much storage for practical use (because 



they appear to use twice as much as classical LISP), but we 
shall see that this judgement was, perhaps, premature. 

We present the MFYCA algorithm here in pseudo-AIgol- 
BCPL notation. The notation "oc[p] M means the contents of 
the word whose address is the value of a plus the value of p, 
i.e. the contents of a + p. If it appears on the left hand side 
of ":=", those contents are to be changed. Thus, p[i] refers to 
the ith component of the vector pointed to by p. The 
function size(p) returns the size of the array pointed to by p. 
The notation "a & p" is similar to the notation "a; p" in 
that a and p are executed in order; however, "a & p" returns 
the value of a rather than the value of p. Thus, ";" and "&" 
are the duals of one another: "a]; 0:2; a n " returns the last 
value (that of 0^) whereas "aj& CC2& a,," returns the first 
value (that of cti). 

Our conventions are these: the user program has a bank 
of NR registers R[l] t R[NR]. The user program may not 
"squirrel away" pointers outside of the bank R during a call to 
CONS because such pointers would become obsolete if 
garbage collection were to occur. (We will show later how to 
deal with a user program stack in such a way that the real-time 
properties of our system are not violated.) Pointers either are 
atoms or refer to cons cells in fromspace or tospace. A cons 
cell c is represented by a 2-vector of pointers: car(c)=c[0], 
cdr(c)=c[\l FLIP, FROMSPACE and TOSPACE are 
implementation-dependent routines. FLIP interchanges the 
roles of fromspace and tospace by causing CONS and COPY 
to allocate in the other semispace and the predicates 
FROMSPACE and TOSPACE to exchange roles. FLIP also 
has the responsibility of determining when the new tospace 
is too small to hold everything from the fromspace plus the 
newly consed cells. Before flipping, it checks if 
size(fromspace) is less than (\+m)[size(tospace)-(T-B)] y 
where T is the top of tospace, and if fromspace (the new 
tospace) is too small, either it must be extended, or the 
system may later stop with a "memory overflow" indication. 



% The Minsky-Fenichel-Yochelson-Cheney-Arnborg 
pointer B; 
pointer S; 
pointer T; 

pointer procedure CONS (x, y) = 
begin 

if B = T 
then 
begin 
flipO; 
for i=l to NR 

do R [i] : =move (R [i] ) ; 
x: =move (x) ; y : =move (y) ; 
while S<B 
do begin 

S[0] :=move(S[0]) ; 
S[l] :=move(S[l]) ; 
S:.=S+2 
end 

end; 

if B>T then error; 
B[0]:=x; B[l]:«y; 
B & (B:=B+2) 
end; 

pointer procedure CAR(x) s x[0] ; 



[26,18,11,1] Garbage Collector. 

% Bottom; points to bottom of free area. 
% Scan; points to first untraced cell. 
% Top; points to top of tospace. 
% Assertions: S<B<T and T-B is even. 
% Allocate the list cell (x.y) 

% If there is no more free space, 

% collect all the garbage. 

% This block is the "garbage collector". 

% Interchange semispaces. 

% Update all user registers. 

% Update our arguments. 
% Trace all accessible cells. 

% Update the car and cdr. 

% Point to next untraced cell. 



% Memory is full. 

% Create new cell at bottom of free area. 
% Return the current value of B 
% after stepping it to next cell. 
% A cell consists of 2 words: 
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pointer procedure CDR(x) s x[l3; 
procedure RPLACA(x,y) = x[0] :=y; 
procedure RPLACD {x, y) s x[l] :=y; 
boolean procedure EQ(x,y) s x=y; 
boolean procedure ATOM(x) s 

not tospace (x) ; 
pointer procedure MOVE(p) s 
if not fromspace(p) 
then p 
else begin 

if not tospace (p [0] ) 

then p[0] :=copy(p); 
pCO] 
end; 

pointer procedure COPY(p) = 
begin 

if B>r then error; 

B[0] :=p[0] ; B[l] :=p[l] ; 

B & (S:=B+2) 
end; 

% TOSPACE, FROMSPACE test whether a 



% car is 1st; cdr is 2nd. 
% car(jc) 
% cdr(jc) :=y 

% Are the same object? 
% Is x an atom? 

% Move p if not yet moved; return new 
% We only need to move old ones. 
% This happens a lot. 

% We must move /?. 

% Copy it into the bottom of free area. 

% Leave and return forwarding address. 

% Create a copy of a cell. 
% Allocate space at bottom of free area. 
% Memory full? 
% Each cell requires 2 words. 
% Return the current value of B 
% after moving it to next cell, 
pointer is in that semispace. 



In order to convert MFYCA into a real-time algorithm, 
we force the mark ratio m to be constant by changing CONS 
so that it does k iterations of the garbage collection loop 
before performing each allocation. But this means that both 
semispaces contain accessible cells at almost all times. In 
order to simplify the algorithm and the proof, we trick the 
user program into believing that garbage collection ran and 
finished at the time of the last flip; i.e. we assert that, as 
before, the user program sees addresses only in tospace. 

Some slight effort must be made to keep up this 
appearance. When the semispaces are interchanged, all the 
user program registers must be updated immediately to point 
to tospace. This gives the collector a head start on the 
mutator. Since the only operations that might violate our 
assertion are CAR and CDR, we make sure that CAR and CDR 



cause forwarding addresses to be followed, and cells to be 
moved, when necessary. This ensures that the mutator cannot 
pass the collector. It turns out that preserving our assertion 
is much simpler than preserving the corresponding 
assertions of DL [16,17,23,24]. In particular, RPLACA and 
RPLACD cannot cause any trouble at all! 

There is another problem caused by interleaving garbage 
collection with normal list processing: the new cells that 
CONS creates will be interleaved with those moved, thereby 
diluting the moved cells which must be traced by CONS. Of 
course, new cells have their cars and cdrs already in tospace 
and therefore do not need to be traced. We avoid this waste of 
trace effort through the use of the pointer T y which points to 
the top of the free area, and by allocating all new cells there. 



% Serial Real-Time System (SRT). 
integer k; 

pointer T; 

pointer procedure CONS (x, y) a 
begin 
if B=T 

then begin 

if S<B then error; 

flipO; 

for i=l to NR 

do R[i] :=move(R[i]); 
x:=move (x) ; y : =move (y) ; 
end; 

for i=l to k while S<B 
do begin 

S[0] :=move{S[0]) ; 
S[l] :=move(S[l]) ; 
S:=S+2 
end; 

if B=T then error; 
T := T-2; 

T[0] := X; T[l] :=y; 
T 

end; 



% Global trace ratio parameter: 

% the number of cells to trace per cons. 

% Top; Points to top of free area. 

% Do some collection, then allocate (jc.v). 

% Check if free area is empty. 

% Switch semispaces. Memory is full 

% if tracing is not finished. 

% Flip semispaces. 

% Update user registers 
% and our arguments. 

% Do k iterations of gc. 

% Update car and cdr. 

% Go on to next untraced cell. 



% Actually create the cell. 

% Move in car and cdr. 

% Return address of new cell. 
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pointer procedure CAR(x) s x[0] :=move (x[0] ) ; 
pointer procedure CDR(x) = x[l] :=move (x[l] ) ; 
% Procedures not redefined here are as before. 

The time required by all of the elementary list operations 
in this algorithm, with the exception of CONS, can easily be 
seen to be bounded by a constant because they are straight- 
line programs composed from primitives which are bounded 
by constants. CONS is also bounded by a constant because 
the number of mutator registers is a (small) fixed number (e.g. 
16), and the parameter k is fixed. In principle, given the 
number of registers and the parameter k, the two loops in 
CONS could be expanded into straight-line code; hence the 
time it requires is also bounded by a constant. 

The proof that the incremental collector eventually 
moves all accessible cells to tospace is an easy induction. 
Upon system initialization there are no accessible cells, 
hence none in tospace, and so we have a correct basis. 
Suppose that at some point in the computation we have just 
switched semispaces so that tospace is empty. Suppose 
further that there are N accessible cells in fromspace which 
must be moved to tospace. Now, every cell which was 
accessible at the time of flipping eventually gets moved 
when it is traced, unless lost through RPLACA and RPLACD, 
and as a result appears between S and B. Furthermore, a cell is 
moved only once, because when it is moved it leaves behind a 
forwarding address which prevents it from being moved 
again. When the pointer S reaches a cell, its edges are 
traced — i.e. the cells they point to are moved, if necessary. 
Finally, only cells which have been moved appear between S 
and B. Therefore, the number of those accessible, unmoved 
cells in fromspace decreases monotonically, eventually 
resulting in no accessible, unmoved cells in fromspace. At 
this point, the collector is done and can interchange the two 
semispaces. 

It should be easy to see why the other list operations 
cannot adversely affect the progress of the collector. A CAR 
or CDR can move a cell before the collector has traced it. But 
since moving it increases B but not S, it will be traced later. 
RPLACA and RPLACD can affect connectivity, but since all 
of their arguments are already in tospace, they have already 
been moved and may or may not have been traced. Consider 
RPL AC k{p,q). Suppose that p has been, traced and q has not. 
But since q has been moved but not traced, it must be between 
S and B and will not be missed. Suppose, on the other hand, 
that q has been traced and p has not. Then when p is traced, 
the old CAR of p will not be traced. But this is all right, 
because it may no longer be accessible. If it still is the target 
of an edge from some accessible cell, then it either already 
has, or will be, traced through that edge. Finally, if either 
both p and q have been traced or both have not been, there is 
obviously no problem. 

This algorithm can also be proved correct by the methods 
of DL [16,17,23,24], because this particular sequence of 
interleaving collection with mutation is only one of the legal 
execution sequences of the DL algorithm on a serial machine. 
Therefore, if the DL algorithm is correct, then so is this one. 
The correspondence is this: white nodes are those which 
reside in fromspace, i.e. those which have not yet been 
moved; grey nodes are those which have been moved but have 
not yet been traced, i.e. those between S and B; and black 
nodes are those which have been moved and traced, and those 
which have been allocated directly in tospace (cells below S 
or above 7). Then the assertions are: 



% Move, update and return x[0]. 
% Move, update and return jc[1]. 



A) each node will only darken monotonically; 

B) no edge will ever point from a black node to a white one; 
and 

C) the user program sees only grey or black nodes. 

We can now see why the burden is on CAR and CDR 
rather than RPLACA and RPLACD— the latter will not violate 
B so long as the former does not violate C. Using these 
assertions, we see that the mutator and the mark phase of the 
collector are essentially doing the same thing: tracing 
accessible cells. The difference is that the collector goes 
about it systematically whereas the mutator wanders. Thus, 
only the collector knows for sure when all the cells in 
fromspace have been traced so that the two semispaces can be 
interchanged. Assertion C also allows CAR and CDR to 
update a cell in which a pointer to fromspace is found, thus 
reducing pointer-chasing for cells which are accessed more 
than once. 

We must now analyze the storage required by this 
algorithm. Suppose that at some flip of the semispaces there 
are AT accessible nodes. Then the collector will not have to 
move or trace any more than N cells. If it traces (makes 
black) exactly k cells per CONS, then when the collector has 
finished, the new semispace will contain <N+N/k=N(\+m) 
cells, letting m-\lk. If only N of these are accessible, as in 
equilibrium conditions, then the next cycle of the collector 
will copy those N cells back to the first semispace, while 
performing Nm CONSES. Hence, we have the inequality: 

Maximum SRT Storage Required < N(2+2m) = N(2+2/k) 

Therefore, for a program which has a maximum cell 
requirement of N cells operating on a fixed-size real memory 
of 2M cells, the parameter k must be greater than N/(M-N) to 
guarantee that tracing is finished before every flip. 

If we compare the bound for our algorithm with the bound 
for MKSMW, using the unlikely assumption that sweeping 
and relocation take no time (s=r~0) > we find that they are 
quite similar in storage requirements. 

Maximum MKSMW Storage Required < N(1.5+2m) 

Maximum SRT Storage Required < N(2+2m) 

If m~\ (which corresponds to one collector iteration per 
CONS), the two algorithms differ by only 1 part in 8, which 
is insignificant given the gross assumptions we have made 
about MKSMW's sweeping and relocation speeds. It is not 
likely that the storage requirements of a MKSMW-type 
algorithm can be significantly improved because it cannot 
take advantage of techniques such as stack threading or CDR- 
coding. Stack threading cannot be done, because accessible 
cells have both their car and cdr in use. 4 CDR-coding using 
MKSMW is very awkward because CONS must search for a 
free cell of the proper size and location before allocating a 
cell, since the free space is fragmented. On the other hand, 
our algorithm can be easily modified to use CDR-coding and 
thereby reduce storage requirements to approximately 
N(\+m). 



4 The Deutsch-Schorr-Waite collector [22,p.417-418] 
"threads" the stack but temporarily reverses the list structure, 
thus locking out the mutator for the duration. 
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3. The Parameter m (= 1/*) 

If k is a positive integer, then the parameter m (=l/fc) 
will lie in the interval 0<m<l. Therefore, the factor of 1+ro 
in our bounds must lie between 1 and 2. This means that the 
storage requirements for our method can be adjusted by 
varying k t but they will not vary by more than a factor of 2 as 
long as k is integral. Now, the time to execute CONS is 
proportional to A+c, for some suitable constant c. Therefore, 
one can trade off storage for CONS speed, but only within 
this limited range. Furthermore, as k rises above 4 the 
storage savings become insignificant; e.g. doubling k to 8 
yields a storage savings of only 10%, yet almost doubles 
CONS time. Of course, if storage is limited and response time 
need not be fast, larger A's might be acceptable. 

If the method is used for the management of a large 
database residing on secondary storage, k could be made a 
positive rational number less than 1, on the average. For 
example, to achieve an average fc=l/3 (m=3), one could have 
CONS perform an iteration of the collector only every third 
time it was called. The result of this would double the storage 
required (m+l=4), but would reduce the average CONS time by 
almost 2/3. Of course, the worst case time performance for 
CONS would still be the same as though k were 1 . 

This improvement is significant because each iteration 
of the collector traces all the pointers of one record. This 
requires retrieving that record, updating all of its pointers by 
moving records if necessary, and then rewriting the record. If 
there are / pointers to be updated, then f+1 records must be 
read and written. This sounds like a lot of work, but this 
much work is done only when a record is created; if there are 
no record creations, then with the exception of the first 
access of a record via a pointer stored in another record, the 
accessing and updating functions will be as fast as on any 
other file management scheme. Therefore, since secondary 
storage is usually cheap but slow, choosing k<\ in a file 
management system allows us to trade off storage space 
against average record creation time. 

With a little more effort, k can even be made variable in 
our method, thus allowing a program to dynamically 
optimize its space-time tradeoff. For example, in a database 
management system a program might set £=0 during an 
initial load of the database because it knows that, even 



though there are many records being created, none are being 
let go, and therefore the continual copying of the collector 
will achieve no compaction. The function READ in LISP 
might want to exercise the same prerogative and for the same 
reason. Of course, any reduction of k should not take effect 
until the next flip to avoid overflowing storage before then. 

4. A User Program Stack 

If the user program utilizes its own stack as well as a 
bank of registers, the stack may (in theory) grow to an 
unbounded size and therefore cannot be wholly updated when 
the semispaces are flipped and still preserve a constant bound 
on the time for CONS. This problem may be trivially solved 
by simulating the stack in the heap (i.e. PUSH(x) s 
CONS(x,stack) and POP0 ■ CVR(stack)); this simulation will 
satisfy the bounded-time constraints of classical stack 
manipulation. However, this simulation has the unfortunate 
property that accessing items on the stack requires time 
proportional to their distance from the top. 

In order to maintain constant access time to elements 
deep in the stack, we keep stack-like allocation and 
deallocation strategies but perform the tracing of the stack in 
an incremental manner. We first fix the stack accessing 
routines so that the user program never sees pointers in 
fromspace. This change requires that the MOVE routine must 
be applied to any pointers which are picked up from the user 
stack. We must then change CONS to save the user stack 
pointer when the semispaces are flipped so that it knows 
which stack locations must be traced. Finally, the user stack 
POP routine must keep this saved pointer current to avoid 
tracing locations which are no longer on the user stack [28]. 

The only remaining question is how many stack 
locations are to be traced at every CONS. To guarantee that 
stack tracing will be finished before the next flip, we must 
choose the stack tracing ratio k' (the number of stack 
locations traced per CONS) so that the ratio k'lk is the same 
as the ratio of stack locations in use to cons cells in use. We 
recompute k f at each flip, because the "in use" statistics are 
available then. Due to this computation, a constant bound on 
the time for CONS exists only if the ratio of stack size to 
heap size is bounded, and it is proportional to that ratio. 



% Serial Real-Time System with User Stack. 

% The user stack resides in the array "ustk" and grows upward from "ustk[0]". The global variable SP is the user stack pointer 
% and points to the current top of the user stack. The global variable SS scans the user stack and points to the highest stack 
% level which has not yet been traced by the collector. 



integer SP init(0); 
integer SS init(0) ; 
procedure USER_PUSH(x) 
begin 

SP:=SP+1; 

UStk[SP] :=X 



% User stack pointer. 

% User stack scanner. 

% Push x onto user stack. 

% Note: x will not be in fromspace. 



pointer procedure USER_POP ( ) = 
move (ustk [SP] ) & 
begin 

SP:=SP-1; 

SS:=min(SS,SP) 
end; 

pointer procedure USER_GET (n) = 
ustk [SP-ri] : =move (ustk [SP-n] ) ; 



% Pop top value from user stack. 
% Move value if necessary. 

% then update stack pointer. 
% Keep stack scanner current. 



% Get nth element from top of stack. 
% Move and update if necessary. 
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pointer procedure CONS(x,y) s 
begin 
if B=T 
then begin 

if SS>0 or S<B 

then error; 
N:=flip(); 
SS:=SP; 

/c':=ceil(k*SS/N); 
for i=l to NR 

do R [i] : =move (R [i] ) ; 
x: =move (x) ; y : =move (y) ; 
end; 

for i=l to k' while S5>0 
do begin 

ustk [SS] : =move (ustk [SS] ) ; 
SS:=SS-1 
end; 

for i=l to k while S<B 
do begin 

S[0] :=move(S[0]) ; 
S[l] remove (S[l]) ; 
S:=S+2 ' 
end; 

if B=T then error; 
T:=T-2; 

T[0] :=x; T[l] :=y; 

r 

end; 

The preceding code exhibits these changes. 

The complexity involved in this conversion is 
essentially that necessary to make the serial real-time method 
work for several different spaces [27]. In such a system, each 
space is a contiguous area in the address space disjoint from 
the other spaces, and has its own representation conventions 
and allocation (and deallocation) strategies. The system of 
this section thus has two spaces, the heap and the user stack, 
which must be managed by cooperating routines. 

5. CDR-Coding (Compact List Representation) 

In this section, we discuss the interaction of our 
algorithm with a partial solution to the second big problem 
with list structures: their inefficient use of storage. Whereas 
a list of 5 elements in a language like Fortran or APL would 
require only a 5 element array, such a list in LISP requires 5 
cells having two pointers apiece. So-called "CDR-coding" 
[12,19] can reduce the storage cost of LISP lists by as much 
as 50%. The idea is simple: memory is divided up into equal- 
sized chunks called g's. Each Q is big enough to hold 2 bits . 
plus a pointer p to another 0. The 2 bits are decoded via the 
table: 

00 — NORMAL; CAR of this node is p; CDR is in the 

following Q. 

01— NIL; CAR of this rode is p; CDR is NIL. 

10— NEXT; CAR of this node is p; CDR is the 

following Q. 

1 1— EXTENDED; The cell extension located at p holds the 

car and cdr for this node.^ 

CDR-coding can reduce by 50% the storage requirements 
of a group of cells for which CDR is a 1-1 function whose 
range excludes non-nil atoms. This is a non-trivial saving, 



5 These conventions are slightly different from [19]. 



% Collect some, then allocate (x.y). 

% Check if free area is empty. 
% Interchange semispaces. 
% Check for memory overflow. 

% Set N to number of cells in use. 
% Start stack scan at top of stack. 
% Calculate stack trace effort. 

% Update user registers 
% and our arguments. 

% Move k' user stack elements and 
% update scan pointer. 



% Do k iterations of gc. 
% Trace and update car, cdr. 



% Actually create the cell. 

% Install car and cdr. 

% Return address of new cell. 



as all "dotless" s-expressions read in by the LISP reader have 
these properties. In fact, it has been found [12] that, after 
linearization, 98% of the non-NIL cdrs in several large LISP 
programs referred to the following cell. These savings are 
due to the fact that CDR-coding takes advantage of the 
implicit linear ordering of addresses in address space. 

What implications does this coding scheme have for the 
elementary list operations of LISP? Most operations must 
dispatch on the CDR code to compute their results, and 
RPLACD needs special handling. Consider RPLACD(/?,^). If 
p has a CDR code of NIL or NEXT, then it must be changed to 
EXTENDED, and the result of CONS(CAR(p),?) placed in p. 6 

The number of memory references in the elementary 
operations has been minimized by making the following 
policies [20]: 1) every EXTENDED ceil has a NORMAL 
extension; 2) the user program will never see a pointer to the 
extension of an EXTENDED cell; and 3) when COPY copies 
an EXTENDED cell, it reconstitutes it without an extension. 

CONS, CAR, CDR, RPLACA, and RPLACD must be 
changed to preserve these assertions, but EQ and ATOM 
require no changes from their non-CDR-coded versions. 
Since an EXTENDED cell cannot point to another EXTENDED 
cell, the forwarding of EXTENDED pointers need not be 
iterated. These policies seem to minimize memory references 
because each cell has a constant (between flips) canonical 
address, thereby avoiding normalization [30] by every 
primitive list operation. 

CDR-coding requires a compacting, linearizing garbage 
collector if it is to keep allocation simple (because it uses 
two different cell sizes) and take full advantage of the 
sequential coding efficiency. The MFYCA algorithm 



°We note in this context that if RPLACD is commonly 
used to destructively reverse a list — e.g. by LISP's 
"NREVERSE" — the system could also have a "PREVIOUS" 
CDR-code so that RPLACD need not call CONS so often. 
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presented above compacts, but does not linearize cdrs due to 
its breadth-first trace order. However, the trace order of an 
MFYCA collector can easily be modified at the cost of an 
additional pointer, PB. PB keeps track of the previous value 
of B (i.e. PB points to the last cell copied), so that tracing 
the cdr of the cell at PB will copy its successor into the next 
consecutive location (B), thus copying whole lists into 
successive contiguous locations. 

The meaning of the scan pointer S is then changed 
slightly so that it points to the next word which must be 



updated rather than the next cell. Finally, the trace routine is 
modified so that tracing the cdr of PB has priority over 
tracing the edge at S and the condition on the trace loop is 
modified to amortize both the copying effort (measured by 
movements of B) and the tracing effort (measured by 
movements of S) over all the CONSES. These modifications 
do not result in a depth-first trace order, but they do result in 
cdr-chains being traced to the end, with few interruptions. 
Thus an MFYCA collector can minimize the amount of 
memory needed by CDR-coded lists. 



% Serial Real-Time System with CDR-Coding 
pointer S; 
pointer PB; 
pointer L,H; 

pointer procedure CONS(x,y) s 
begin 

if T-B<2 
then begin 

if S<B then error , - 

flipO; 

for i=l to NR. 

do R[i] :=move(R[i] ) ; 
x: =move (x) ; y : =move (y) 
end; 

while (S+B)/2-L < k* (H-T) and S<B 
do if PB<B 

then PB: = (B&CDR(PB) ) ; 
else begin 

5[0] :=move(S[0] ) ; 
S: =5+1 
end; 

if B=T then error; 

T:=T-1; 

if y=nil 

then code(T) :=»NIL» 
else if y=T+l 

then coded 7 ) : = "NEXT" 
else begin 

if B=T then error; 
T:=T-1; 

COde(T) : = "NORMAL" ; 
T[l] :=y 
end; 

r[0] :=X; 
T 

end; 

pointer procedure CAR(x) = 
brplaca (x, move (bear (x) ) ) ; 
procedure RPLACA(x,y) s 

brplaca (x,y) ; 
pointer procedure BCAR(x) s 
if code (x) =" EXTENDED 11 
then (x[03 ) [03 
else x[03 ; 
pointer procedure BRPLACA (p, q) = 
if code (p)= "EXTENDED" 
then (p[03) [0] :=q 
else p[0] :=q; 
pointer procedure CDR(x) s 
brplacd (x, move (bedr (x) ) ) ; 



% Next cell whose car needs tracing. 
% Pointer to previous value of B. 
% Low and high limits of tospace. 
% Assertion: L<S<PB<B<T<H. 
% Create a new cell in tospace with 
% car of x and cdr ofy. 
% Flip when free area is exhausted. 
% This part is the same as usual. 
% Interchance semispaces. 



% Update user registers. 
% Update our arguments. 

% Trace and copy a measured amount. 
% Extend current list, if possible. 
% CDR will trace this edge for us. 

% Update this edge. 

% Step S over this word. 

% Check for memory overflow. 
% Create new cell at top of free area. 

% If y is special case, 
% then create a short cell 
% with appropriate cdr-code, 
% Otherwise, create a normal cell. 
% Need more space for the cdr. 

% Set in "NORMAL" cdr-code. 
% Set in the cdr. 

% Set the car in the new cell. 
% Return the new cell. 

% CAR must move cell it uncovered. 

% Update this edge. 

% *[0]:=}>. May require subtlety. 

% Basic car; dispatch on CDR-code. 
% Type "EXTENDED" means 
% indirect car. 

% All other types have normal cars. 
% Basic rplaca; dispatch on CDR-code. 
% If extended cell, clobber indirectly. 

% All others have normal car. 

% CDR moves uncovered cell, but updates 

% only if still possible after move. 
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procedure RPLACD {x, y) a 
begin 

if code(x)="NIL" or code (x) = "NEXT" 
then 

begin pointer p; 

p:=CONS(CAR(x) , "DUMMY") ; 
x : =move ( x) ; y : =move (y) ; 
X[0] :=p; 

code(x) := "EXTENDED" 
end; 
brplacd (x, y) 
end; 

pointer procedure BCDR(x) = 
if code (x) =" NORMAL" then x[l] 
else if code(x)="NIL" then nil 
else if code (x) =" NEXT" then x+1 
else (x[0]) [1] ; 

pointer procedure BRPLACD (p, q) = 
if code (p) =" EXTENDED" 

then (p[03) [1] :=g 
else if code (p) = "NORMAL" 
then p[l] :=g 

else g; 

integer procedure SIZE(p) = 

if code (p) = "NORMAL" 
then 2 else 1; 
pointer procedure COPY(p) s 

begin 

if PB=B-2 and bcdr (PB)=p 
then begin 

COde(PB) :="NEXT"; 
B:=B-1 
end; 
if bcdr(p)=nil 

then code{B) :="NIL" 
else code(B) := "NORMAL" ; 
B[0] :=bcar(p); 
brplacd (B, bcdr (p) ) ; 
PB:=B; 

B:=B+size (B) ; 
if B>T then error,- 
PB 
end; 

% Procedures not redefined here are as before. 



% x [ 1 ] :=y . May require brute force. 

% Test for screw cases. 

% EXTENDED case will fall through. 

% Extend the cell x. 

% Construct guaranteed NORMAL cell. 

% Update arguments in case CONS flipped. 

% Leave forwarding address in old cell. 

% The old cell has now been extended. 

% Finally replace the cdr. 

% Basic cdr; dispatch on CDR-code. 

% NORMAL cells have a second word. 

% Interpret NIL CDR-code. 

% Interpret NEXT CDR-code. 

% EXTENDED cells point to NORMAL cells. 

% Handle easy cases of RPLACD. 

% We have extended cell; 

% clobber the NORMAL indirect. 

% The easiest case of all. 

% In all cases, return q as value. 

% Find the size of p from its CDR-code. 

% "NIL", "NEXT" and "EXTENDED" all have 

% size(p)=l. 

% Copy the cell p; append to current 

% train if possible. 

% See if we can hop this NEXT train. 

% Convert NORMAL cell to NEXT cell. 
% Reuse extra space now available. 

% Create a NIL cell, if appropriate. 

% Otherwise, all cells are NORMAL. 

% Copy over car; 

% and cdr too, if necessary. 

% PB is end of current NEXT train. 

% Step B over newly copied cell, 

% check for memory overflow, 

% and return pointer to new copy. 



The size of the tospace needed for CDR-coding is (1+m) 
times the amount of space actually used in fromspace. With a 
coding efficiency improvement of e over the classical storage 
of LISP cells, and under equilibrium conditions, we have the 
inequality: 

Maximum SRTC Storage Required < M?(2+2m) 

Since we have claimed that e=.5, we get the following 
estimate: 

SRTC Storage Required ~ N(\+m) (!) 

But this latter expression is less than the bound 
computed for MKSMW. Thus, CDR-coding has given us back 
the factor of 2 that the copying garbage collector took away. 

The real-time properties of our algorithm have not been 
affected in the least by CDR-coding; in fact, good microcode 
might be able to process CDR-coded lists faster than normal 
lists since fewer references to main memory are needed. 

CDR-coding is not the final answer to the coding 
efficiency problems of list storage, because far more compact 



codes can be devised to store LISP's s-expressions. For 
example, both the car and cdr of a cell could be coded by 
relative offsets rather than full pointers [12]. However, a 
more compact code would represent some cells in so few bits 
that the pointer we need for a forwarding address would not 
fit, rendering our scheme unworkable. Part of the problem is 
inherent in LISP's small cell size; small arrays can perform 
much better. 

6. Vectors and Arrays 

Arrays can be included quite easily into our framework of 
incremental garbage collection by simply enclosing certain 
parts of the collector program in loops which iterate through 
all the pointers in the array, not just the first and second. The 
convergence of the method with regard to storage space can 
also be proved and bounds derived. However, the method can 
no longer claim to be real-time because neither the time taken 
by the array allocation function (ARRAY-CONS) nor the time 
taken by the array element accessing function is bounded by a 
constant. This unbounded behavior has two sources: copying 
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an array and tracing all its pointers both require time 
proportional to the length of the array. Therefore, if these 
operations are included in a computer as noninterruptable 
primitive instructions, hard interrupt response time bounds 
for that computer will not exist. However, an arbitrary bound 
(say 10), placed on the size of all arrays by either the system 
or the programmer, allows such bounds to be derived. 

A scheme which overcomes some of these problems has 
been devised [31]. In it, each vector is given a special link 
word which holds either a forwarding pointer (for vectors in 
fromspace which have been partially moved), a backward link 
(for incomplete vectors in tospace), or NIL (for complete 
vectors). MOVE no longer copies the whole array, but only 
allocates space and installs the forward and backward links. 
Any reference to an element of a moved but incompletely 
updated vector will follow the backward link to the fromspace 
and access the corresponding element there. When the scan 
pointer in the tospace encounters such a vector, its elements 
are incrementally updated by applying MOVE to the 
corresponding elements of its old self; after the new one is 
complete, its link is set to NIL. Element accesses to 
incomplete vectors compare the scan pointer to the element 
address; access is made to the old (new) vector if the scan 
pointer is less (greater or equal). Tracing and updating 
exactly k*n vector elements (not necessarily all from the 
same vector) upon every allocation of a vector of length n 
guarantees convergence. 

Steele's scheme has the following properties: the time 
for referencing an element of any cell or vector is bounded by 
a constant while the time to allocate a new object of size n is 
bounded by c\kn+C2 y for some constants c\ and C2- Hence, a 
sequence of list and vector operations can be given tight time 
bounds. 

7. Hash Tables and Hash Links 

Some recent artificial intelligence programs written in 
LISP have found it convenient to associate property lists 
with list cells as well as symbolic atoms. Since few cells 
actually have property lists, it is a waste of storage to 
allocate to every cell a pointer which points to the cell's 
property list. Therefore, it has been suggested [9] that one 
bit be set aside in every cell to indicate whether the cell has a 
property list. If so, the property list can be found by looking 
in a hash table, using the address of the list cell as the key. 

Such a table requires special handling in systems having 
a relocating garbage collector. Our copying scheme gives 
each semispace its own hash table, and when a cell is copied 
over into tospace, its property list pointer is entered in the 
"to" table under the cell's new address. Then when the copied 
cell is encountered by the "scan" pointer, its property list 
pointer is updated along with its normal components. A 
"CDR-coding" system with two "scan" pointers should also 
keep a third for tracing property list pointers to prevent 
property lists from destroying chains of "next"-type cells. 

8. Reference Counting 

In this section we consider whether reference counting 
can be used as a method of storage reclamation to process 
lists in real time; i.e. we try to answer the question "is 
reference counting worth the effort in a real-time system, and 
if so, under what conditions?" 

A classical reference count system [13,34] keeps for each 
cell a count of the number of pointers which point (refer) to 
that cell; i.e. its in-degree. This reference count (refcount) is 
continually updated as pointers to the cell are created and 



destroyed, and when it drops to zero, the cell is reclaimed. 
When reclaimed, the refcounts of any daughter cells it points 
to are decremented, and are also reclaimed if zero, in a 
recursive manner. 

Reference counting appears to be unsuitable for real-time 
applications because a potentially unbounded amount of work 
must be done when a cell is let go. However, if a free stack is 
used to keep track of freed objects instead of a free list [34], 
the newly freed cell is simply pushed onto the free stack. 
When a cell is needed, it is popped off the stack, the refcounts 
of its daughters are decremented, and if zero, the daughters are 
pushed back onto the stack. Then the cell which was popped 
is returned. In this way, only a bounded amount of work 
needs to be done on each allocation. 

We now consider the storage requirements of a reference 
counting (RC) system. In addition to the memory for N cells, 
we also need room for N refcounts and a stack. Since the 
refcounts can go as high as N, they require approximately the 
same space as a pointer. So we have: 

Maximum RC Space Required 

< \ .5N+ the size of the "free stack" 

The worst case stack depth is N. However, whenever a 
cell is on the stack, its refcount is zero, so we can thread the 
stack through the unused refcounts! So we now have: 

Maximum RC Space Required < 1 .5 N 

Reference count systems have the drawback that directed 
cycles of pointers cannot be reclaimed. It has been suggested 
[15,22] that refcounts be used as the "primary" method of 
reclamation, using garbage collection (GC) as a fallback 
method when that fails. Since RC will not have to reclaim 
everything and since the average refcount is often very small, 
it has also been suggested that a truncated refcount (a bounded 
counter which sticks at its highest value if it overflows) be 
used to save space. 

We say that garbage in a combination RC and GC system 
is ref-degradable if and only if it can be reclaimed by 
refcounts alone. Cells whose truncated refcounts are struck 
are therefore non-ref-degradable. 

What is the effect of a dual system in terms of 
performance? Whatever the RC system is able to recycle puts 
off flipping that much longer. By the time a flip happens in 
such a two level system, there is no ref-degradable garbage 
left in tospace. Therefore, the turnover of the semispaces is 
slowed. 

How much memory does the dual system require? If 
truncated refcounts are used, the free stack cannot be threaded 
through a cell's refcount because it is not big enough to hold 
a pointer. Therefore, using this method and assuming only a 
few bits worth of truncated refcount per cell, we have: 

Maximum SRT + RC Space Required 

£ N(2+2m) + RC free stack £ N(2.5+2m) 

So it appears that we have lost something by adding 
refcounts (even tiny ones), because we still need room for the 
free stack. 

Let us now examine more closely the average timing of 
CONS under a pure RC versus a pure SRT system. The average 
time for CONS under the RC system is the same as the 
maximum time since there is no freedom in the algorithm. 
The time for CONS in SRT is c\k + C2, where c\ and ci are 
constants. Now C2 is simply the time to allocate space from a 
contiguous block of free storage. Certainly incrementing a 
pointer is much less complex than popping a cell from a 
stack, following its pointers, decrementing their refcounts, 
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and if zero, pushing them onto the stack. Therefore, we can 
choose k small enough^ so that the average time to perform 
CONS with our SRT method is smaller than the average time 
to perform CONS in an RC system. 8 This analysis does not 
even count the additional time needed to keep the refcounts 
updated. Of course, the storage required for our "pure" SRT 
system may be many times the storage of the RC system, but 
SRT will have a smaller average CONS time. 

Since this seems counterintuitive, or at least reactionary 
(given the current penchant for recycling), we give a 
rationale for why it is so. Reference counting traces the 
garbage cells, while normal garbage collection traces the 
accessible cells. Once the number of garbage cells exceeds 
the number of accessible cells in a region of storage, it is 
faster to copy the accessible cells out of the region and 
recycle it whole. When m>l, reference counting cannot 
compete timewise with garbage collection because RC must 
trace a cell for every cell allocated while GC traces on the 
average only a fraction {Mm) of a cell for every cell allocated. 

On the other hand, if we wish to minimize storage by 
making m<\, a dual scheme with truncated refcounts should 
reduce the average CONS time over that in the pure scheme. 
However, CDR-coded lists and other variable sized objects 
cannot be easily managed with reference counting because the 
object at the top of the free stack is not necessarily the right 
size for the current allocation. Thus, CDR-coding can reduce 
the storage requirement of a "pure" scheme below that of a 
"dual" system with the same m. But even on a system with 
objects of uniform size, we are skeptical whether the 
increased average efficiency of CONS in the "dual" system 
will offset the increase in k needed to keep the storage 
requirements the same as the "pure" system. We conclude 
that, at least on a real memory computer, reference counting 
probably is not a good storage management technique unless 
one a) has uniformly sized objects; b) uses full counts; and c) 
guarantees no cycles. 

This is not to say that reference counts are not useful. If 
the LISP language were extended with a function to return the 
current refcount of an object, and suitably clean semantics 
were associated with this function, then one might be able to 
make use of this information within the user program to 
speed up certain algorithms, such as structure tracing or 
backtracking, d la Bobrow and Wegbreit [8). This author is 
not aware of any language which makes this information 
available; if it were available, good programmers would 
certainly find a use for it. 

9. The Costs of Real-Time List Processing 

The amount of storage and time used by a real-time list 
processing system can be compared with that used by a 
classical list processing system using garbage collection on 
tasks not requiring bounded response times. The storage 
required by a classical noncompacting garbage collector is 
JV(l+u), if the system uses the Deutsch-Schorr-Waite (DSW) 
[22,p.417-418] marking algorithm, and JV(1.5+n) if it uses a 
normal stack, for some positive \i. If CDR-coding is used, 
copying must be done; the storage requirement is then 
Ne(2+2\i), where e is the efficiency of the coding. Since e is 
near .5 [12], the requirement is about iV(l+H), so that CDR- 



' Section 3 deals with nonintegral k's. 

8 We can discount the additional time occasionally 
required by CAR and CDR in our method because any 
relocation and pointer updating done by them is work that we 
have charged to CONS, and does not have to be recounted. 



coding requires approximately the same space as DSW. 
Comparing these expressions with those derived earlier for 
our real-time algorithms, we find that processing LISP lists 
in real-time requires no more space than a non-real-time* 
system using DSW. If larger non-uniform-sized objects like 
arrays must be managed, real-time capability requires no more 
space than the MFYCA system, since a copying collector is 
already assumed. 

The average time requirement for CONS in our real-time 
system is virtually identical to that in a classical MFYCA 
system using the same cell representation and the same 
amount of storage. This is because 1) a classical system can 
do \xN CONSES after doing a garbage collection which marks 
AT nodes — thus giving an average cons/mark ratio of \l and 
allowing us to identify \i with m — and 2) garbage collection 
in our real-time system is almost identical to that in the 
MFYCA system, except that it is done incrementally during 
calls to CONS. In other words, the user program pays for the 
cost of a cell's reclamation at the time the cell is created by 
tracing some other cell. 

CAR and CDR are a bit slower, because they must test 
whether the value to be returned is in fromspace. However, as 
noted above, any cell movement done inside CAR or CDR 
should not be charged to CAR or CDR because it is work 
which the collector would otherwise have to do and therefore 
has already been accounted for in our analysis of CONS. 
Therefore, CAR and CDR are only slower by the time required 
for the semispace test. 9 

Since RPLACA, RPLACD, EQ, and ATOM are unchanged 
from their classical versions, their timings are also 
unchanged. 

The overhead calculated for our serial system can be 
compared to that in the parallel system of [33]. According to 
these calculations, a parallel garbage collector requires 
significantly more total time than a nonparallel collector. 
But this contradiction disappears when it is realized that the 
author's parallel collector continues tracing even in the 
absence of any cell creation activity. Since our system keys 
collector activity to cell creation, the collector effort is about 
the same as on a non-real-time system. 

10. Applications 

1) A Fixed Size, Real Memory Computer. 

This application covers the classical 7090 LISP [25] as 
well as a LISP for a microcomputer. We conceive of even 16- 
bit microcomputers utilizing this algorithm for real-time 
process control or simulation tasks. Each of the list 
processing primitives is intended to run with interrupts 
inhibited, so that all interrupt processing can make use of list 
storage for its buffers and other needs. Multiple processes 
may also use these primitives so long as CONS, CAR, and 
CDR are used by one process at a time; i.e. they are protected 
by one system-wide lock. Of course, the system must be 
aware of the registers of every process. 

For these real memory applications, we want to put as 
much of the available storage under the management of the 
algorithm as possible. Thus, both atoms (here we mean the 
whole LISP atom-complex, not just the print-name) and list 
nodes are stored in the semispaces. CDR-coding is usually a 
good idea to save memory, but unless the bit testing is done 



9 In Greenblatt's LISP machine [19], the virtual memory 
map performs the semispace test as an integral part of address 
translation. 
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in microcode, it may be faster to use normal cells and increase 
the parameter k to keep the storage size small. 

The average CONS time is reduced by putting off flipping 
until all of the free space in tospace is exhausted, i.e. B-T. 
Thus, after all moving and tracing is done, i.e. 5=2?, 
allocation is trivial until B=T. As a result, the average CONS 
time in our real-time system is approximately the same as 
that in a classical system. Of course, with a memory size of 
2M, the maximum number of cells that can be safely managed 
is still Mk/(k+\). 

2) A Virtual Memory Computer. 

The current epitome of this application is Multics LISP 
with an address space of 2 36 (=10") 36-bit words, room for 
billions of list cells. The problem here is not in reclaiming 
cells that are let go, but keeping accessible cells compact so 
that they occupy as few pages of real memory as possible. 
The MFYCA algorithm does this admirably and ours does 
almost as well. 

Our scheme is still real-time on a virtual memory 
computer, but the bounds on the elementary list operations 
now have the order of magnitude of secondary storage 
operations. 

There are some problems, however. Unlike MFYCA, 
wherein both semi spaces were used only during garbage 
collection, our method requires that they both be active (i.e. 
partially in real memory) at all times. This may increase the 
average working set size. A careful analysis needs to be made 
of our algorithm in order to estimate the additional cost of 
incremental garbage collection. Brief consideration tells us 
that the active address space varies from a minimum of 
N(l+m) just before a flip to N(2+2m) just after. Since at a flip 
the user program registers are updated in numerical order, 
relatively constant pointers should be placed in the lower 
numbered registers to keep the trace order of constant list 
structure similar between flips. If the average size of an 
object is much larger than the size of a pointer, the working 
set may also be reduced by storing the forwarding addresses in 
a separate table instead of in the old objects in fromspace [7]. 

In a virtual memory environment, the active address 
space will automatically expand and contract in response to 
changes in the number of accessible cells if 1) FLIP re-adjusts 
the size of fromspace to (l+m)[cells in tospace] just before 
interchanging the semispaces; and 2) flipping occurs when 
tracing finishes rather than when B meets T. This policy, 
plus a smaller k than a real memory computer would use, 
should give both a fast CONS and a tolerable working set 
size. The parameter k can also be dynamically adjusted to 
optimize either running time (including paging) or cost 
according to some pricing policy by following an analysis 
similar to that of Hoare and others [2,10,21]. 

3) A Database Management System. 

We conceive of a huge database having millions of 
records, which may contain pointers to other records, being 
managed by our algorithm. Examples of such databases are a 
bill of materials database for the Apollo Project, or a 
complete semantic dictionary and thesaurus of English for a 
language understanding program. Performing a classical 
garbage collection on such a databank would be out of the 
question, since it might require days or weeks to complete, 
given current disk technology. 

Some of these large database systems currently depend on 
reference counts for storage reclamation, and so do not allow 
directed cycles of pointers. Since our method performs 
general garbage collection, this restriction could be dropped. 
Moreover, given enough space, our algorithm can take even 



less time than a reference count system. When compared with 
a classical garbage collection system, our method would not 
save any total time in processing transactions against such a 
database, but it would avoid the catastrophic consequences of 
a garbage collection during a period of heavy demand. 

This case is very much like case 1, the real memory 
computer, because we assume that the database is orders of 
magnitude too big to fit into primary memory and thus that 
there is little hope for a speedup from the locality of reference 
effect. "Read memory" and "store memory" instructions here 
apply to secondary storage; the constant bounds for the 
elementary operations are now on the order of milliseconds 
rather than microseconds. Therefore, almost everything that 
we say about real memory implementations also applies to 
large database implementations, except that space is cheaper 
and time is more dear. 

4) A Totally New Computer Architecture. 

We conceive of an architecture in which a CPU is 
connected to a list memory instead of a random access 
memory. Machines of this architecture are similar to 
"linking automata" [22, p. 462-463] and "storage 
modification machines" [29]. At the interface between the 
CPU and the memory sits a bank of pointer registers, which 
point at particular cells in the list memory. Instead of a bus 
which communicates both addresses and values, with read and 
write commands, the memory would have only a data bus and 
commands like CAR, CDR, CONS, RPLACA, RPLACD, EQ, 
and ATOM, whose arguments and returned values would be in 
the pointer registers. The CPU would not have access to the 
bit strings stored in the pointer registers, except those which 
pointed to atoms (objects outside both fromspace and 
tospace). This restriction is necessary to keep the CPU from 
depending upon memory addresses which might be changed 
by the management algorithm without the CPU's knowledge. 

An advantage of such a system over random access 
memory is the elimination of the huge address bus that is 
normally needed between the CPU and the memory, since 
addresses are not dealt with directly by the CPU. As the 
number of bits on a chip increases, the number of address 
lines and supporting logic becomes a critical factor. 

Our method of garbage collection can also be used with a 
random access write-once memory by appending an extra 
word to each cell which holds the forwarding address when 
that cell is eventually moved. Using such a system, the cells 
in tospace cannot be updated until they are moved to the new 
tospace after the next flip. In other words, three semispaces 
need to be active at all times. In addition to these changes, 
RPLACA and RPLACD must actually perform a CONS, just 
like RPLACD occasionally does in our CDR-coding system. 
Perhaps the write-once property can eliminate the need for 
transaction journals and backup tapes. 

11. Conclusions and Future Work 

We have exhibited a method for doing list processing on 
a serial computer in a real-time environment where the time 
required by all of the elementary list operations must be 
bounded by a constant which is independent of the number of 
list cells in use. This algorithm was made possible through: 
1) a new proof of correctness of parallel garbage collection 
based on tie assertion that the user program sees only marked 
cells; 2) the realization that the collection effort must be 
proportional to new cell creation; and 3) the belief that the 
complex interaction required by these policies makes parallel 
collection unwieldy. We have also exhibited extensions of 
this algorithm to handle a user program stack, "CDR- 



12 



Communications of the ACM 21, 4 (April 1978), 280-294. 



coding," vectors of contiguous words, and hash linking. 
Therefore, we consider our system to be an attractive 
alternative to reference counting for real-time storage 
management and have shown that, given enough storage, our 
method will outperform a reference count system, without 
requiring the topological restrictions of that system. 

Our real-time scheme is strikingly similar to the 
incremental garbage collector proposed independently by 
Barbacci for a microcoded LISP machine [3]. However, his 
non-real-time proposal differs in the key points listed above. 
Our system will itself appear in microcoded form in 
Greenblatt's LISP machine [19]. 

There is still some freedom in our algorithm which has 
not been explored. The order in which the cells are traced is 
not important for the algorithm's correctness or real-time 
properties. The average properties of the algorithm when run 
on a virtual memory machine need to be extensively 
investigated. 

The space required by our algorithm may be excessive for 
some applications. Perhaps a synthesis of the area concept 
[6,7] with our method could reduce the memory requirements 
of a list processing system while preserving the bounded- 
time properties of the elementary operations. 

A garbage collection algorithm can be viewed as a means 
for converting a Von Neumann-style random access memory 
(with "side-effects" [25]) into a list memory (without "side- 
effects"). Perhaps a list memory can be implemented directly 
in hardware which uses considerably less energy by taking 
advantage of the lack of side-effects in list operations [4]. 

Acknowledgments. I wish to thank the people at 
M.I.T.'s Artificial Intelligence Laboratory and Laboratory for 
Computer Science (formerly Project MAC) for their time 
discussing these ideas, and especially Peter Bishop, John 
DeTreville, Richard Greenblatt, Carl Hewitt, Al Mok, Guy 
Steele, and the referees for their copious comments and 
helpful suggestions after reading early versions of this paper. 
I also wish to thank John McCarthy for ignoring David 
Hilbert's advice about "leaving elegance to the tailors" when 
he created the LISP language. 

Received September 1976; revised May 1977 
References 

1. Arnborg, S. Storage administration in a virtual memory 
SIMULA system. BIT 12 (1912), 125-141. 

2. Arnborg, S. Optimal memory management in a system with 

garbage collection. BIT 14 (1974), 375-381. 

3. Barbacci, M. A LISP Processor for C.ai. Memo 
CMU-CS-71-103, Computer Sci. Dept., Carnegie-Mellon. 
Pittsburgh, Pa., 1971. 

4. Bennett, C.H. Logical reversibility of computation. IBM 
J. Res. Develop. 17 (1973), 525. 

5. Birtwistle, G.M., Dahl, O.-J., Myhrhaug, B, and Nygaard, 
K. Simula Begin. Auerbach, Philadelphia, Pa., 1973. 

6. Bishop, P.B. Garbage collection in a very large address 
space. Working Paper 111, M.I.T. A.I. Lab., M.I.T., 
Sept. 1975. 

7. Bishop, P.B. Computer systems with a very large address 

space and garbage collection. Ph.D. Th., TR-178, MIT 
Lab. for Computer Sci., Cambridge, Mass., May 1977. 
Forthcoming. 

8. Bobrow, D.G. and Wegbreit, B. A model and stack 
implementation of multiple environments. Comm. ACM 
16, 10 (Oct. 1973), 591-603. 

9. Bobrow, D.G. A note on hash linking. Comm. ACM 18, 1 

(July 1975), 413-415. 



10. Campbell, J.A. Optimal use of storage in a simple model 
of garbage collection. Inform. Processing Letters 3, 2 
(Nov. 1974), 37-38. 

11. Cheney, C.J. A nonrecursive list compacting algorithm. 
Comm. ACM 13, 11 (Nov. 1970), 677-678. 

12. Clark, D.W., and Green, C.C. An empirical study of list 
structure in LISP. Comm. ACM 20, 2 (Feb. 1977), 78-87. 

13. Collins, G.E, A method for overlapping and erasure of 
lists. Comm. ACM 3,12 (Dec. 1960), 655-657. 

14. Dahl, 0.-J., and Nygaard, K. SIMULA— an ALGOL-based 
simulation language. Comm. ACM 9, 9 (Sept. 1966), 
671-678. 

15. Deutsch, L.P., and Bobrow, D.G. An efficient, 
incremental, automatic garbage collector. Comm. ACM 
19, 9 (Sept. 1976), 522-526. 

16. Dijkstra, E.W., Lamport, L., Martin, A.J., Scholten, C. 
S., Steffens, E.F.M. On-the-fly garbage collection: An 
exercise in cooperation. E. W.Dijkstra note EWD496, 
June 1975. 

17. Dijkstra, E.W. After many a sobering experience. E.W. 
Dijkstra note EWD500. 

18. Fenichel, R.R., and Yochelson, J.C. A LISP garbage- 
collector for virtual-memory computer systems. Comm. 
ACM 12, 11 (Nov. 1969), 611-612. 

19. Greenblatt, R. LISP Machine Progress Report memo 444. 
A.I. Lab., M.I.T., Cambridge, Mass., Aug. 1977. 

20. Greenblatt, R. Private communication, Feb. 1977. 

21. Hoare, C.A.R. Optimization of store size for garbage 
collection. Inform. Processing Letters 2 (1974), 165- 
166. 

22. Knuth, D.E. The Art of Computer Programming, Vol. I: 
Fundamental Algorithms. Addison-Wesley, Reading, 
Mass., 1968. 

23. Lamport, L. Garbage collection with multiple processes: 
An exercise in parallelism. CA-7602-251 1, Mass. 
Computer Associates, Wakefield, Mass., Feb. 1976. 

24. Lamport, L. On-the-fly garbage collection: Once more 
with rigor. CA-7508-161 1, Mass. Computer Associates, 
Wakefield, Mass., Aug. 1975. 

25. McCarthy, J., et al. LISP 1.5 Programmer's Manual. 
M.I.T. Press, Cambridge, Mass., 1965. 

26. Minsky, M.L. A LISP garbage collector algorithm using 
serial secondary storage. Memo 58, M.I.T. A.I. Lab., 
M.I.T., Cambridge, Mass., Oct. 1963. 

27. Moon, D.A. MACLISP Reference Manual. Project MAC, 
M.I.T., Cambridge, Mass., December 1975. 

28. Muller, K.G. On the feasibility of concurrent garbage 
collection. Ph.D. Th., Tech. Hogeschool Delft, The 
Netherlands, March 1976 (in English). 

29. Schonhage, A. Real-time simulation of multidimensional 
Turing machines by storage modification machines. 
TM-37, Project MAC, M.I.T., Cambridge, Mass., Dec. 
1973. 

30. Steele, G.L. Jr. Multiprocessing compactifying garbage 
collection. Comm. ACM 18, 9 (Sept. 1975), 495-508. 

31. Steele, G.L. Jr. Private communication, March 1977. 

32. Teitelman, W., et al INTERLISP Reference Manual. 
Xerox Palo Alto Res. Ctr., Palo Alto, Calif, 1974. 

33. Wadler, P.L. Analysis of an algorithm for real-time 
garbage collection. Comm. ACM 19, 9 (Sept. 1976), 
491-500. 

34. Weizenbaum, J. Symmetric list processor. Comm. ACM 
6 9 9 (Sept. 1963), 524-544. 



13 



